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Sentence complexity as an indicator 
of L2 learner’s listening difficulty 
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Abstract. This paper investigates the effect of sentence complexity, specifically 
lexical and syntactic surprisal, on L2 listening difficulty. Psycholinguistic studies 
revealed that surprisal cases correlate with textual comprehension difficulty. Based 
on surprisal theory, these cases are less probable or expected, considering the 
precedent context, thus require more complex processing to comprehend. Little is 
known about the influence of the surprisal factor on L2 listening comprehension. 
We aim to examine this effect and propose to include these cases in captioning to 
assist L2 listeners. Since conventional captions include the whole transcript, we use 
Partial and Synchronized Caption (PSC) with limited textual clues, which allows 
for highlighting surprisal cases to reduce ambiguity. In our experiment, intermediate 
learners of English (undergraduates) were asked to transcribe and paraphrase videos 
containing surprisal cases. Results revealed that learners faced difficulty when 
encountering surprisal, which was partially addressed with the help of PSC, yet 


more assistance was required. 
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1. Introduction 


Investigating appropriate methods to teach L2 listening is a continuing concern 
given that listening has been long considered as a passive skill (Osada, 2004). 
Several factors are known to make L2 listening difficult, including acoustic, lexical, 
syntactic, and content-related features (Bloomfield et al., 2010). Previous research 
has investigated the influence of syntactic features on reading difficulty, but this 
aspect is not adequately considered in L2 listening. One of the elements involved 
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in sentence complexity is surprisal, which relates to the predictability of a word 
in the context, with a highly probable word being easier to process. According 
to the expectation-based model for syntactic comprehension, one measures the 
probability of the next input based on the preceding context (Levy, 2008). Studies 
using {MRI, EEG, and eye-tracking provide evidence for the effect of surprisal on 
working memory load, reading time, and comprehension (Smith & Levy, 2013). 
However, little is known about how this factor affects L2 listening. 


In this study, we investigate whether syntactic and lexical surprisal affects L2 
listening difficulty and propose the inclusion of this factor in PSC to facilitate 
listening. PSC is a captioning tool that automatically detects difficult words/phrases, 
includes them in the caption, and removes trivial cases (partial). It also synchronizes 
each word with the relevant speech segment (word-level synchronization). PSC 
aims to decrease dependence on the caption and promote listening over reading 
(Mirzaei, Meshgi, & Kawahara, 2018). Only acoustic (speech rate, breached 
boundaries, acoustically similar words) and lexical factors (word frequency, 
specificity) are used in PSC, yet sentence complexities are not addressed. In this 
study, we focus on syntactic surprisal using the structural confusion of a sentence, 
discovered by a probabilistic grammar/parser. We also measure lexical surprisal, 
utilizing the probability of the next word based on a corpus-based N-gram. The 
words with high surprisal scores are selected to be included in PSC. 


Figure 1. Screenshot of a surprisal case appearing in PSC 


veoeseeee oe UNPFEpared ... .... tsunami of doubt ... ......... 


Original sentence: “Scientists were unprepared for this tsunami of doubt and questions and distrust.” 
Heidi Larson | TEDMED2020 
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Figure | above shows a TED talk with PSC that includes a surprisal case ([tsunami 
of] doubt). The selective nature of PSC allows for highlighting challenging 
aspects of listening. By adding surprisal, we aim to facilitate recognition and 
comprehension, decrease cognitive load, and foster ambiguity resolution. 


2. Lexical and syntactic surprisal for L2 listening 


There is a strong correlation between lexical or syntactic surprisal with the required 
effort for parsing and processing the sentences (e.g. “the horse raced past the barn 
fell”). This notion is based on surprisal theory (Levy, 2008), which assumes that a 
word’s predictability can determine difficulty. In this view, the cognitive effort it 
takes for the learner to process a word is proportional to its surprisal (Hale, 2001). 


Speech is transient, and we can assume that when a learner encounters a word that 
is different from what she/he expects to hear, the attention is confined, leading to 
confusion, cognitive overload, and misrecognition. A similar situation can happen 
when a learner tries to match a preferred sentence structure to an input speech and 
finds a mismatch. To investigate this hypothesis, we included the words having a 
high surprisal score to the PSC generated for TED talks to be used as material for 
L2 listeners (Figure 2). 


Figure 2. The procedure for generating PSC including surprisal cases 
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(processing (learner’s level 
transcript & assessment) 
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We used N-gram surprisal and PCFG? surprisal to detect lexical and structural 
surprisal cases. N-grams are calculated on TED corpus using KenLM, and the 
lexical surprisal is calculated as the negative log probability of the word given 
the previous N-/ words. A PCFG-based incremental parser (van Schijndel, Exley, 
& Schuler, 2013) is employed to determine the dependency relations of previous 
words. Similar to how humans comprehend the input, an incremental parser 
integrates incoming words in a syntax that fits the preceding context. Surprise arises 
when the input word changes the probability distribution over the possible parses, 
namely the expectation of the parser about the underlying syntax. Each word of an 
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N-gram ending with a lexical surprisal is included in PSC to facilitate recognition 
and avoid surprises. For syntactic surprisal, a part of the sentence whose parse tree 
is changed drastically when processing the latest input is considered surprising and 
is shown in the caption all at once. 


3. Preliminary evaluation and discussion 


Our participants were 17 Japanese and nine Chinese engineering undergraduates 
who were intermediate learners of English (aged 19-21 with 520~725 TOEIC* 
scores ~ CEFR°B1). We selected 20 clips from TED videos, on average 37 seconds, 
each including one surprisal case (ten lexical and ten syntactic cases) and one easy 
case (control). The easy cases were selected using PSC’s difficulty measures (i.e. 
words that are automatically omitted for being trivial). When selecting the surprisal 
cases, we made sure that acoustic-related difficulties are not present. 


The participants were asked to watch each video segment and fill the blanks for 
the 2~3 words in the last sentence, when the video was paused. Subsequently, they 
were asked to paraphrase that sentence. The purpose was to check how accurately 
the learners can transcribe/paraphrase easy versus surprisal cases. 


Figure 3 shows participants’ correct and incorrect answers on easy versus surprisal 
cases as well as their scores of the paraphrasing task. As the figure suggests, 
participants could transcribe the easy cases significantly better than the surprisal 
ones. Data shows that learners faced slightly more difficulties in transcribing lexical 
surprisals as compared with syntactic surprisals. However, this difference was not 
statistically significant (p=0.35). In the paraphrasing task, learners’ scores indicate 
that more difficulty is associated with syntactic surprisal case. It can be argued that 
lexical surprisal leads to more misrecognition (e.g. [tsunami of] ‘doubt’, transcribed 
as ‘that’), while syntactic surprisal makes comprehension more difficult. 


Finally, to check if the inclusion of surprisal cases in PSC can assist learners 
with listening, we asked the participants to re-paraphrase the target segment 
after watching it with enhanced PSC (showing surprisal cases). The results are 
demonstrated in Figure 4, which indicates that PSC could significantly facilitate 
comprehension for lexical surprisal cases (p<0.05). Although including syntactic 
cases in PSC resulted in better scores, the improvement was not significant 
(p=0.06). Moreover, participants’ overall scores reflect that a better sort of scaffold 


4. Test of English for International Communication 
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is necessary to help them improve their performance. This finding suggests that 
merely showing these cases in PSC was not adequate for alleviating comprehension 
difficulty. Thus a better method should be considered to help learners comprehend 
structural surprisals. Generating shorter or simplified sentences and presenting 
them along with the original one in PSC could be one way to address this issue. 
Furthermore, repeating the experiment with control and treatment groups and 
learners of different proficiency levels can provide insights to design better tools. 


Figure 3. Participants’ scores on transcription and paraphrasing of easy versus 
surprisal cases 
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Figure 4. The score of participants on paraphrasing task with/without using PSC 
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4. Conclusions 


We investigated the influence of syntactic and lexical surprisal on L2 learners’ 
listening and found that the existence of surprisal cases leads to difficulty in 
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recognition (cloze-test transcription) and comprehension (paraphrasing test) of 
the speech input. Findings revealed that the inclusion of these cases into PSC is 
more helpful with lexical surprisal cases than structural ones. However, further 
evaluation is necessary to find in what ways, including these cases into PSC, can 
foster listening. Additionally, more conclusive results could be gained using eye- 
trackers to investigate the learner’s fixation and cognitive load when surprisal cases 
are presented in the caption. Future work should consider a more effective method 
to address these cases. Simplification of the syntactic surprisal cases and adding 
them to caption could be one approach to consider. 
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